Interpretable machine learning models for predicting the incidence of antibiotic- associated diarrhea in elderly ICU patients

Background Antibiotic-associated diarrhea (AAD) can prolong hospitalization, increase medical costs, and even lead to higher mortality rates. Therefore, it is essential to predict the incidence of AAD in elderly intensive care unit(ICU) patients. The objective of this study was to create a prediction model that is both interpretable and generalizable for predicting the incidence of AAD in elderly ICU patients. Methods We retrospectively analyzed data from the First Medical Center of the People’s Liberation Army General Hospital (PLAGH) in China. We utilized the machine learning model Extreme Gradient Boosting (XGBoost) and Shapley’s additive interpretation method to predict the incidence of AAD in elderly ICU patients in an interpretable manner. Results A total of 848 adult ICU patients were eligible for this study. The XGBoost model predicted the incidence of AAD with an area under the receiver operating characteristic curve (ROC) of 0.917, sensitivity of 0.889, specificity of 0.806, accuracy of 0.870, and an F1 score of 0.780. The XGBoost model outperformed the other models, including logistic regression, support vector machine (AUC = 0.809), K-nearest neighbor algorithm (AUC = 0.872), and plain Bayes (AUC = 0.774). Conclusions While the XGBoost model may not excel in absolute performance, it demonstrates superior predictive capabilities compared to other models in forecasting the incidence of AAD in elderly ICU patients categorized based on their characteristics.


Background
Antibiotic-associated diarrhea (AAD) is a type of diarrhea that occurs subsequent to antibiotic administration and cannot be attributed to any other etiology.The prevalence of AAD ranges from 5 to 35% [1][2][3].Critically ill patients in intensive care units (ICU) exhibit a higher incidence of AAD due to the complexity of their conditions, the diverse array of antibiotics utilized, and the frequent use of antibiotic combinations [4][5][6].With the aging demographic, there has been a rise in the proportion of elderly patients in the ICU.As the elderly population experiences a reduction in beneficial commensal bacteria, the intestinal barrier becomes more vulnerable.Consequently, the elderly are more susceptible to the effects of antibiotic use, leading to an elevated risk of AAD.The occurrence of AAD prolongs hospitalization, escalates medical expenses, and may even contribute to increased mortality [7][8][9].Early identification of patients at risk of AAD is critical and may facilitate timely prevention and intervention.This study aimed to construct a predictive model for AAD risk using data from the Department of Critical Care Medicine at the First Medical Center of the People's Liberation Army General Hospital (PLAGH).The SHAP method was employed to explicate the predictive model, enabling it to not only anticipate outcomes but also provide a logical rationale for the prediction, thereby significantly bolstering user confidence in the model.

Methods
We performed a longitudinal, monocenter, retrospective study based on PLAGH database.We reported according to the TRIPOD Checklist.

Study population
Data on patients admitted to the Department of Critical Care Medicine at the First Medical Center of the General Hospital of the People's Liberation Army (PLA) from January 1, 2020, to June 30, 2022, and treated with antibiotics were retrospectively analyzed.Inclusion criteria: (1) aged 60 years or older; (2) received antibiotic treatment within 7 days of admission to the ICU; (3) absence of diarrhea symptoms upon admission to the ICU.Exclusion criteria: (1) ICU length of stay ≤ 2 days; (2) palliative care; (3) diarrhea symptoms upon admission to the ICU (including previous chronic gastrointestinal diseases such as irritable bowel syndrome, ischemic bowel disease, and inflammatory bowel disease, as well as acute gastrointestinal diseases such as food poisoning, acute gastroenteritis, and laxative medication); (4) postoperative gastrointestinal tumors (i.e., admitted to the ICU with a jejunostomy, a ileostomy and a colostomy); (5) Missing clinical information.

Grouping
Grouping was conducted based on the AAD diagnostic criteria, with individuals who met the criteria were included in the AAD group, and those who did not meet the criteria were included in the control group.The AAD group consisted of patients who met the AAD diagnostic criteria, which included the absence of diarrhea prior to admission and recent or current use of antimicrobial drugs.Symptoms of diarrhea in this context were defined as having three or more loose or watery stools per day, along with bloody or mucus-pus-blood stools, fever, abdominal pain, and other specific criteria.Other potential causes of diarrhea, such as underlying conditions and improper care, were excluded [10].

Data extraction
We collected baseline characteristics of patients within the first 24 h of ICU admission and clinical and pharmacologic measures within 7 days of ICU admission.Demographic parameters included age, gender, and body mass index (BMI).Clinical treatment measures included mechanical ventilation, continuous renal replacement therapy (RRT), and enteral nutrition.Laboratory parameters included hemoglobin (Hb), C-reactive protein (CRP), interleukin-6 (IL-6), platelet count (Plt), procalcitonin (Pct), albumin (Alb), serum creatinine (Scr), serum phosphorus (P), amylase, and lipase.Pharmacologic interventions included third generation cephalosporin antibiotics(ceftazidime, ceftriaxone, cefoperazone sodium sulbactam sodium), carbapenem antibiotic(meropenem), glycopeptide antibiotics(ticlopidine, vancomycin), tetracycline antibiotics(tigecycline), penicillin antibiotics(piperacillin sodium tazobactam sodium), oxazolidinone antibiotics(linezolid), anti-anaerobic antibiotics(ornidazole), antifungal antibiotics(fluconazole, caspofungin) and sedative and analgesics (propofol, dexmedetomidine, midazolam, bupropion).Disease severity was assessed using the Acute Physiology and Chronic Health Evaluation II (APACHE II) [11] and Sequential Organ Failure Assessment (SOFA) score [12].Study outcomes included the length of ICU stay and hospital mortality.The study complied with the Declaration of Helsinki and was approved by the Ethical Committee of the General Hospital of the People's Liberation Army (PLA)(S2017-054-02).Considering that this was a retrospective observational study.Informed consent is deemed unnecessary by the Ethical Committee of the General Hospital of the People's Liberation Army (PLA).
All computations and analyses were performed using Python version 3.9.Continuous variables were represented as means ± standard deviations (SDs) or medians and interquartile ranges (IQRs).Categorical variables were presented as totals and percentages.
Group comparisons were conducted using the Kruskal-Wallis test for continuous variables, and the chi-square test and ANOVA for categorical variables.Statistical significance was defined as p-values less than 0.05.Variables with missing values exceeding 40% were excluded from further analysis, and the overall median was used to interpolate the remaining missing data.The study cohort was randomized with 70% of the data used for model training and 30% for model testing.The study employed LASSO regression analysis to identify the variables that could predict the likelihood of developing AAD.Five machine learning methods (XGBoost, Logistic Regression [LR], Support Vector Machine [SVM], k Nearest Neighbor Algorithm [KNN], and Plain Bayes [NB]) were employed to develop predictive models.Key hyperparameters of XGBoost were set to their default values, including the learning rate (learning rate = 0.1), the maximum depth of each tree (max depth = 3), and the number of modeled sequence trees (n estimators = 20).Evaluation metrics included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and F1 score.The F1 score combines the precision and recall of a classifier into a single score ranging from 0 to 1 [13].Precision is defined as TP/(TP + FP) (where TP denotes true positives and FP denotes false positives), and the model's accuracy was assessed by confirming the correct TP.Recall is defined as TP/(TP + FN) (where FN denotes false negatives) and is used to measure how many true positives are identified by the model.The F1 score is defined as 2 × (precision × recall)/(precision + recall), representing a balance between precision and recall [14,15].SHAP values were utilized to interpret early prediction models.They offer a unified approach for interpreting the outcomes of any machine learning model and provide consistent and locally accurate attribution values for each feature [16,17].

Baseline characteristics of included patients
We analyzed a total of 848 qualified adult patients for this study.The flow chart of patient recruitment is shown in Fig. 1.The dataset was split randomly into two sections: 70% of the data was utilized for training the model, and 30% was used for testing the model (Table 1).The occurrence of AAD in the training set was 22.32% (139 out of 596), and in the testing set, it was 21.82% (55 out of 252), as indicated in Table 1.
Fig. 1 The flow chart of patient recruitment Modeling 37 variables measured at admission were included in the Lasso regression analysis.After Lasso regression selection (see Fig. 2). 10 variables were identified as predictors of AAD occurrence.These variables included hemoglobin, C-reactive protein, use of tigecycline, butorphanol, vancomycin, linezolid, fluconazole, meropenem, enteral nutrition, and renal replacement therapy.We employed various machine learning techniques, including XGBoost, LR, SVM, KNN and NB, to predict the   occurrence of AAD in elderly ICU patients using all available variables as input features.The findings revealed that XGBoost achieved the highest AUC for the test dataset (0.917, 95% confidence interval = 0.881-0.948)(Fig. 3; Table 2).

Model evaluation
Brier scores and DCA are important metrics used to evaluate predictive models.XGBoost's Brier score is much lower and better than other models.Figure 4 shows the calibration curves for the nine models.The DCA shows that the XGBoost model can be used as a tool to predict  the occurrence of AAD (Fig. 5).We also applied K-fold cross-validation to evaluate the model performance, as shown in Table 3.  patients was predicted to be associated with elevated levels of c-reactive protein (CRP) (9.71 mg/L), high serum phosphorus (P) levels (0.37 mmol/L), elevated procalcitonin (PCT) levels (0.447 ng/mL), and the use of enteral nutrition.Conversely, non-AAD patients were predicted to have lower levels of calcitoninogen (0.066 ng/mL), lower levels of adiponectin (13.4 U/L), normal platelet levels (PLT) (188 × 10^9/L), lower levels of C-reactive protein (CRP) (0.87 mg/L), and a younger age (60 years).Figure 7 presents the SHAP dependency plot for the top 12 important variables.It was observed that elevated levels of calcitoninogen, interleukin-6, adiponectin, and C-reactive protein, as well as older age, vancomycin use, and enteral nutrition, were associated with a higher incidence of AAD.Conversely, lower levels of hemoglobin, serum phosphorus, and platelets were linked to a higher incidence of AAD.The use of the sedative propofol may reduce the incidence of AAD in elderly ICU patients.Finally, the confusion matrix was utilized to display the prediction outcomes of the XGBoost model, with a positive predictive value of 84.6% and a negative predictive value of 86.6%.

Discussion
In this study, we developed and internally validated a machine learning algorithm using 37 features to predict the occurrence of AAD in elderly patients in the ICU.The XGBoost model outperforms LR, SVM, KNN and NB.The variables necessary for calculating the risk of AAD occurrence are typically readily available at the time of admission.Additionally, we have employed SHAP to interpret the XGBoost model, which will aid physicians in comprehending the decision-making process of the model.Early and aggressive preventive measures are imperative if a patient is at high risk of developing AAD.The widespread use of broad-spectrum antibiotics in recent years has raised global concerns about the incidence of AAD in elderly ICU patients [18].The occurrence of AAD in patients can extend the length of hospital stay, raise healthcare expenses, and potentially contribute to higher mortality [10,19,20].As a result, it is essential to prevent the occurrence of AAD and to identify and treat it as early as possible.Most current studies focus on analyzing the risk factors for AAD in ICU patients.For example, a retrospective study conducted at a single center analyzed the risk factors for AAD in ICU patients.The study found that advanced age, prolonged ICU stay, extended use of proton pump inhibitors, and prolonged antibiotic use were associated with a higher risk of AAD in elderly ICU patients [21]。No study has developed a model to predict the risk of AAD in elderly  also utilized to interpret the predictive model, enabling it to not only forecast the user's expected outcome but also to offer a rational explanation for the prediction.This significantly enhanced the user's trust in the model.In our investigation, we observed that the administration of sedative and analgesic medications (specifically propofol, butorphanol, and remifentanil) was associated with a decreased risk of AAD.This reduction is likely attributed to the inhibitory effects of opioids on gastrointestinal motility, resulting in reduced bowel movements [22,23].This decrease may lead to disturbances in the gut microbiota and intestinal barrier function, thereby increasing the likelihood of bacterial translocation [24].However, it is important to note that the overall impact may not necessarily be a protective factor.Previous research has indicated that nearly all classes of antibiotics may contribute to the onset of AAD [25,26].
A retrospective analysis revealed that cefoperazone/ sulbactam or piperacillin/ tazobactam resulted in a similar incidence of AAD [27].Nevertheless, there is a lack of comparative studies examining the effects of different antibiotics on AAD incidence.The antibiotics identified as the top 20 risk factors with the highest predictive value in our study were vancomycin, linezolid, fluconazole, and piperacillin sodium-tazobactam sodium.Among these, vancomycin exhibited the most significant impact in predicting the occurrence of AAD in elderly ICU patients.
The study has a few potential limitations.Firstly, it employed a small sample size for model development and lacked an external validation cohort.Consequently, future multicenter studies with larger sample sizes are imperative to assess the model's generalizability.Secondly, the medication therapy considered in the training and testing datasets only encompassed antibiotics and sedative-analgesic medications, neglecting other medications that could significantly influence the incidence of AAD.This oversight may limit the model's applicability.

Conclusion
In summary, we developed five different AAD prediction models and calibrated them using AUROC, Brier Score, and DCA to select the best performing model.The best machine learning algorithm with good performance is selected.We hope that this model can aid physicians in early intervention and treatment, potentially reducing the length of ICU hospitalization and healthcare costs for elderly patients.

Fig. 3 Fig. 2
Fig. 3 Predictive performance of training dataset(A) and testing dataset(B) evaluated by machine learning methods.Log Reg, logistic regression; SVM, support vector machine; KNN, k-nearest neighbor; Tree, Decision Tree; NB, naive Bayesian; XGBoost, extreme gradient boosting Based on the above, we can conclude that the XGBoost model significantly outperforms the other four machine learning models.Therefore, we apply the SHAP model to explain the XGBoost model.The diagram in Fig.6illustrates the ranking of the top 20 risk factors and their importance.The SHAP value, represented on the x-axis, acts as a standardized measure of a feature's impact on the response model.Each row in the feature importance chart displays patient attributes related to the outcome using different colored dots, with red and blue dots indicating high and low values, respectively.A higher SHAP value for a characteristic indicates a greater risk of patient morbidity.The first 20 variables are presented in descending order of mean importance (SHAP value).Additionally, the model prediction results are interpreted based on two samples from the dataset.This interpretation highlights the features contributing to pushing the model output away from the base value.Features that increase the prediction are depicted in red, while those that decrease the prediction are shown in blue.For instance, the high risk of acute aortic dissection (AAD) in

Fig. 4
Fig. 4 Calibration plots of five models.The XGBoost achieved lower(better) Brier scores(0.146)compared with the other models

Fig. 6
Fig. 6 The model's interpretation.A, The importance ranking of the top 20 risk factors.The SHAP value (x-axis) is a unified index responding to the effect of a feature in the model.In each feature importance row, all patients' attributes to the outcome were plotted using different colored dots, in which the red (blue) dots represent high (low) values.The higher the SHAP value of a feature, the higher the risk of death for the patient.B, The importance ranking of the top 20 variables according to the mean (|SHAP value|).C, D, The interpretation of model prediction results with the two samples.This explanation shows features each contributing to pushing the model output from the base value to the model output.Features pushing the prediction higher are shown in red, and those pushing the prediction lower are shown in blue.C, The AAD patient was predicted to occur AAD because of their high C-reaction protein(CRP)(9.71mg•L − 1) level, high serium phosphorus(P)(0.37mmol•L − 1) level, high procalcitonin(PCT)(0.447ng• mL − 1) level and the use of enteral nutrition; D, The non-AAD patient was predicted to be normal defecation function because of a lower procalcitonin (0.066ng• mL − 1) level, lower lipase (13.4U•L − 1) level, normal platelet(PLT)(188*10^9 •L − 1) level, lower C-reaction protein(CRP)(0.87mg•L − 1) level, lower age(60y).WBC, white blood cell; Pct, procalcitonin; Alb, albumin; Hb, hemoglobin; IL-6, interleukin-6; Scr, serum creatinine; Plt, platelet; CRP, C-reaction protein; SOFA, Sequential Organ Failure Assessment; BMI, body mass index; CCI, Charlson comorbidity index; PT, Piperacillin/Tazobactam

Table 1
Baseline characteristics of two dataests

Table 2
The performance of each model for prediction SE, sensitivity; SP, specificity; AC, accuracy.Log Reg, logistic regression; SVM, support vector machine; KNN, k-nearest neighbor; NB, naive Bayesian; XGBoost, extreme gradient boosting

Table 3
The K-fold crossing-validation of each model for prediction Decision curve analysis for five machine learnin models.The XGBoost model can serve as the best prediction to AAD